Convolutional Neural Network

In this second exercise-notebook we will play with Convolutional Neural Network (CNN).

As you should have seen, a CNN is a feed-forward neural network tipically composed of Convolutional, MaxPooling and Dense layers.

If the task implemented by the CNN is a classification task, the last Dense layer should use the Softmax activation, and the loss should be the categorical crossentropy.

Reference: https://github.com/fchollet/keras/blob/master/examples/cifar10_cnn.py

Network Topology Model

A simple CNN, with one input branch and one output branch can be defined using a Sequential model and stacking together all its layers.

In this exercise we want to build a (quite shallow) network which contains two [Convolution, Convolution, MaxPooling] stages, and two Dense layers.

To test a different optimizer, we will use AdaDelta, which is a bit more complex than the simple Vanilla SGD with momentum.


In [2]:
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Flatten, Activation
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.optimizers import Adadelta

input_shape = (3, 32, 32)
nb_classes = 10

## [conv@32x3x3+relu]x2 --> MaxPool@2x2 --> DropOut@0.25 -->
## [conv@64x3x3+relu]x2 --> MaxPool@2x2 --> DropOut@0.25 -->
## Flatten--> FC@512+relu --> DropOut@0.5 --> FC@nb_classes+SoftMax
## NOTE: each couple of Conv filters must have `border_mode="same"` and `"valid"`, respectively

## your code here


Using TensorFlow backend.

In [ ]:
# %load solutions/sol_223.py

Understanding layer shapes

An important feature of Keras layers is that each of them has an input_shape attribute, which you can use to visualize the shape of the input tensor, and an output_shape attribute, for inspecting the shape of the output tensor.

As we can see, the input shape of the first convolutional layer corresponds to the input_shape attribute (which must be specified by the user).

In this case, it is a 32x32 image with three color channels.

Since this convolutional layer has the border_mode set to same, its output width and height will remain the same, and the number of output channel will be equal to the number of filters learned by the layer, 16.

The following convolutional layers, instead, have the default border_mode, and therefore reduce width and height by $(k-1)$, where $k$ is the size of the kernel.

MaxPooling layers, instead, reduce width and height of the input tensor, but keep the same number of channels. Activation layers, of course, don't change the shape.


In [4]:
for i, layer in enumerate(model.layers):
    print ("Layer", i, "\t", layer.input_shape, "\t", layer.output_shape)


Layer 0 	 (None, 3, 32, 32) 	 (None, 32, 32, 32)
Layer 1 	 (None, 32, 32, 32) 	 (None, 32, 32, 32)
Layer 2 	 (None, 32, 32, 32) 	 (None, 32, 30, 30)
Layer 3 	 (None, 32, 30, 30) 	 (None, 32, 30, 30)
Layer 4 	 (None, 32, 30, 30) 	 (None, 32, 15, 15)
Layer 5 	 (None, 32, 15, 15) 	 (None, 32, 15, 15)
Layer 6 	 (None, 32, 15, 15) 	 (None, 64, 15, 15)
Layer 7 	 (None, 64, 15, 15) 	 (None, 64, 15, 15)
Layer 8 	 (None, 64, 15, 15) 	 (None, 64, 13, 13)
Layer 9 	 (None, 64, 13, 13) 	 (None, 64, 13, 13)
Layer 10 	 (None, 64, 13, 13) 	 (None, 64, 6, 6)
Layer 11 	 (None, 64, 6, 6) 	 (None, 64, 6, 6)
Layer 12 	 (None, 64, 6, 6) 	 (None, 2304)
Layer 13 	 (None, 2304) 	 (None, 512)
Layer 14 	 (None, 512) 	 (None, 512)
Layer 15 	 (None, 512) 	 (None, 512)
Layer 16 	 (None, 512) 	 (None, 10)
Layer 17 	 (None, 10) 	 (None, 10)

Understanding weights shape

In the same way, we can visualize the shape of the weights learned by each layer. In particular, Keras lets you inspect weights by using the get_weights method of a layer object. This will return a list with two elements, the first one being the weight tensor and the second one being the bias vector.

Of course, MaxPooling layer don't have any weight tensor, since they don't have learnable parameters. Convolutional layers, instead, learn a $(n_o, n_i, k, k)$ weight tensor, where $k$ is the size of the kernel, $n_i$ is the number of channels of the input tensor, and $n_o$ is the number of filters to be learned. For each of the $n_o$ filters, a bias is also learned. Dense layers learn a $(n_i, n_o)$ weight tensor, where $n_o$ is the output size and $n_i$ is the input size of the layer. Each of the $n_o$ neurons also has a bias.


In [5]:
for i, layer in enumerate(model.layers):
    if len(layer.get_weights()) > 0:
        print("Layer", i, "\t", layer.get_weights()[0].shape, "\t", layer.get_weights()[1].shape)


Layer 0 	 (32, 3, 3, 3) 	 (32,)
Layer 2 	 (32, 32, 3, 3) 	 (32,)
Layer 6 	 (64, 32, 3, 3) 	 (64,)
Layer 8 	 (64, 64, 3, 3) 	 (64,)
Layer 13 	 (2304, 512) 	 (512,)
Layer 16 	 (512, 10) 	 (10,)

Training the network

We will train our network on the CIFAR10 dataset, which contains 50,000 32x32 color training images, labeled over 10 categories, and 10,000 test images.

As this dataset is also included in Keras datasets, we just ask the keras.datasets module for the dataset.

Training and test images are normalized to lie in the $\left[0,1\right]$ interval.


In [6]:
from keras.datasets import cifar10
from keras.utils import np_utils

(X_train, y_train), (X_test, y_test) = cifar10.load_data()
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
X_train = X_train.astype("float32")
X_test = X_test.astype("float32")
X_train /= 255
X_test /= 255

To reduce the risk of overfitting, we also apply some image transformation, like rotations, shifts and flips. All these can be easily implemented using the Keras Image Data Generator.

Warning: The following cells may be computational Intensive....


In [7]:
from keras.preprocessing.image import ImageDataGenerator

generated_images = ImageDataGenerator(
    featurewise_center=True,  # set input mean to 0 over the dataset
    samplewise_center=False,  # set each sample mean to 0
    featurewise_std_normalization=True,  # divide inputs by std of the dataset
    samplewise_std_normalization=False,  # divide each input by its std
    zca_whitening=False,  # apply ZCA whitening
    rotation_range=0,  # randomly rotate images in the range (degrees, 0 to 180)
    width_shift_range=0.2,  # randomly shift images horizontally (fraction of total width)
    height_shift_range=0.2,  # randomly shift images vertically (fraction of total height)
    horizontal_flip=True,  # randomly flip images
    vertical_flip=False)  # randomly flip images

generated_images.fit(X_train)

Now we can start training.

At each iteration, a batch of 500 images is requested to the ImageDataGenerator object, and then fed to the network.


In [10]:
X_train.shape


Out[10]:
(50000, 3, 32, 32)

In [11]:
gen = generated_images.flow(X_train, Y_train, batch_size=500, shuffle=True)
X_batch, Y_batch = next(gen)

In [12]:
X_batch.shape


Out[12]:
(500, 3, 32, 32)

In [ ]:
from keras.utils import generic_utils

n_epochs = 2
for e in range(n_epochs):
    print('Epoch', e)
    print('Training...')
    progbar = generic_utils.Progbar(X_train.shape[0])
    
    for X_batch, Y_batch in generated_images.flow(X_train, Y_train, batch_size=500, shuffle=True):
        loss = model.train_on_batch(X_batch, Y_batch)
        progbar.add(X_batch.shape[0], values=[('train loss', loss[0])])